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ABSTRACT  I  ft' 

We  give  two  results  concerning  the  properties  of  state-space  models  with 
exponential  family  observation  distribution  and  conjugate  state  distribution. 
The  first  result  gives  a  simple  and  general  interpretation  of  the  parameters  of 
the  predictive  state  distribution  in  terms  of  the  observation  forecast  distribu¬ 
tion.  The  second  result  shows  how  the  first  result  can  be  used  to  check  the 
long-term  model  properties  of  recurrence  and  ergodicity  for  a  class  of  non- 
Gaussian  observation  distributions.  In  particular,  these  results  apply  to  models 
with  Poisson,  binomial  and  multinomial  observation  distributions. 

KEYWORDS:  Bayesian  forecasting;  Binomial  time  series;  Multinomial  time 
series;  Poisson  time  series;  Recursive  updating;  Time  series.  /’>  ^  . 
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1.  INTRODUCTION 

The  state-space  approach  provides  a  powerful  formulation  of  many  models  for  studying 

time  series  in  the  time  domain.  This  approach  encompasses  Kalman  Filter  and  ARMA 

models  (Harrison  and  Stevens,  1976;  Harvey,  1981,  ch.  4)  and  has  provided  the  standard 

approaches  for  treating  missing  values  and  irregularly  spaced  data  (Jones,  1980  and  1981)  and 

for  computing  the  Gaussian  likelihood  for  many  models  (Harvey  and  Philips,  1979).  More 
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recently,  a  Bayesian  view  of  state-space  models  has  opened  the  way  for  modeling  time  series 
of  non-Gaussian  data.  For  example,  useful  models  have  been  suggested  for  time  series  of 
binomial  or  Poisson  observations  (West,  Harrison  and  Migon,  1985),  and  we  have  been 
exploring  the  case  of  compositional  data  (Grunwald,  Raftery  and  Guttorp,  1989).  These 
models  provide  a  simple  and  computationally  efficient  framework  for  filtering  and  forecasting 
such  series.  In  some  cases  they  also  allow  the  incorporation  of  covariates,  trends,  seasonality 
and  interventions  in  a  natural  way. 

In  this  paper  we  present  results  concerning  the  properties  of  some  non-Gaussian  state- 
space  models.  In  particular,  we  study  the  recurrence  and  ergodicity  of  a  class  of  models,  and 
illustrate  the  results  with  the  Poisson,  binomial  and  multinonial  examples.  We  also  give  a 
general  interpretation  of  the  parameters  of  the  state  distribution  in  terms  of  the  observation 
forecast  distribution. 

2.  NON-GAUSSIAN  STATE-SPACE  MODELING 

A  general  state  space  model  consists  of  three  parts;  an  observation  distribution,  a  state 
distribution  and  a  method  for  making  state  forecasts.  We  assume  an  exponential  family  distri¬ 
bution  for  the  observation  and  a  state  distribution  that  is  conjugate  to  the  observation  distribu¬ 
tion,  but  leave  the  state  prediction  rule  unspecified  for  the  present. 

Let  yt  =  (y  \ ,  ■  ■  ■  ,yd)T  be  a  vector  in  Rd,  and  suppose  that  y,  follows  an  exponential 
family  observation  distribution  with  density 

P  (y,  1 9f )  =  exp  { y/0,  - M (0, )  +  5 (y, ) )  IY(yt ).  (2  1) 

Let  4*  be  the  interior  of  the  convex  hull  of  the  support  Y  of  the  observation  density,  and 
assume  that  W  is  a  non-empty  open  set  in  R6*.  Let  0,  e  ©s{0e  Rd  :A/(0)<«>} ,  the 
natural  parameter  space,  where  A/ (0)  s  log  J  exp  { yTQ  }p(y\$)dy.  Assume  that  ©  is  a  non¬ 
empty  open  set  in  Rd .  These  conditions  hold  for  the  cases  of  most  practical  interest,  where 
the  observations  have  Poisson,  binomial  and  multinomial  distributions. 
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The  conjugate  density  for  the  above  observation  distribution  is 

P  (®f  I  1/  \t )  0,1  exP  [  &t  u  —  M  (0f)}  ] .  (2.2) 

The  subscripts  and  superscripts  u|/2  for  the  parameters  of  this  distribution  describe  the 
parameter  at  time  / 1  given  data  and  covariate  information  at  all  times  up  to  and  including  r2. 

It  follows  from  Theorem  1  of  Diaconis  and  Ylvisaker  (1979)  that  if  o;  t;  >0  and 
Kf(f  e  VF,  then  the  density  (2.2)  can  be  normalized  to  a  probability  density  and,  for  ©  =  Rrf, 
these  are  the  only  parameter  values  for  which  the  state  density  is  finite.  When  ©  *  there 
may  be  other  parameter  values  for  which  the  density  can  be  normalized.  Usually  in  practice 
the  observation  density  has  some  standard  form  and  then  the  natural  conjugate  distribution  is 
well  known;  for  example,  the  beta  is  conjugate  to  the  binomial,  the  gamma  to  the  Poisson, 
and  the  normal  to  the  normal  when  the  variance  is  known. 

To  completely  specify  the  state  space  model  it  remains  only  to  describe  a  state  prediction 
rule,  or  a  method  for  obtaining  the  predictive  state  density  p(0(+1|a(+lu,K(+llt).  These  den¬ 
sities  will  act  as  the  priors  in  the  sequential  applications  of  Bayes’  Theorem  to  follow,  and  are 
specified  through  relations  involving  their  parameters: 

in  (2-3) 

Km/->k,+1„.  (2.4) 

Such  a  rule  may  involve  other  parameters,  covariate  information  or  past  data,  and  expresses 
what  we  "expect"  to  happen  in  the  interval  (r,r+l),  during  which  no  new  data  are  observed. 
This  is  where  the  temporal  structure  enters  the  model. 

The  model  specification  is  now  complete.  Upon  observing  the  new  datum  yt+1,  applica¬ 
tion  of  Bayes’  Theorem  yields  the  state  posterior  density,  p(0(+1  |o(+1  u+1,  Kf+1  u+1),  to  be  of 
the  same  form  as  (2.2)  (the  conjugacy  property)  with  parameters 

°/+ii/+i  =  °/+n/  + 

*(+iu+i  =  *h-i  ii  +£/+i(yi+i —  Kt+i  it )  =  0~ '&+i)*r+i  it  +8t+iyt+i-  (2.5) 


j 
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In  (2.5),  gt+ 1  =  l/Gl+i  |/+j  is  analogous  to  the  gain  in  the  usual  Kalman  Filter. 

The  successive  applications  of  Bayes’  Theorem  give  recursions  for  the  parameters  of  the 
posterior  state  distribution,  and  this  procedure  is  known  as  filtering.  The  filtered  distribution 
can  often  be  thought  of  as  the  distribution  of  a  signal  that  is  to  be  observed  with  (possibly 
non-additive)  noise.  When  both  observation  and  state  distributions  are  normal  and  the  obser¬ 
vation  variance  is  assumed  to  be  known,  this  procedure  yields  the  usual  Kalman  filter.  The 
predictive  distribution  of  the  next  observation,  or  forecast  density,  is  obtained  by  integrating 
out  the  state  from  the  observation  density  (2.1),  yielding 

p(yt+i\yt)  =  \  /7(yt+il®f+i)/,(er+ily/)rf0r+i-  (2-6) 

0 

If  the  state  prediction  rule  given  by  equations  (2.3)  and  (2.4)  is  deterministic,  then  condi¬ 
tioning  9,+1  on  y‘  is  equivalent  to  conditioning  on  (of+1  |f ,  Kf+lu).  In  some  of  the  more 
common  cases,  this  integration  can  be  done  easily  and  a  simple  closed  form  for  the  forecast 
distribution  is  then  available  (negative  binomial  for  Poisson  observations,  beta-binomial  for 
binomial  observations  and  normal  for  Gaussian  observations)  while  in  other  cases  the  integra¬ 
tion  may  not  be  possible.  In  the  following  section  we  give  a  simple  and  general  result  for  the 
mean  of  the  forecast  distribution  in  terms  of  the  parameters  of  the  state  distribution.  This 
result  is  used  in  proving  the  succeeding  results  concerning  recurrence  and  ergodicity,  and  is 
also  of  interest  in  its  own  right  as  a  point  forecast  of  the  observation. 

The  results  we  give  concerning  recurrence  and  ergodicity  are  based  on  Tweedie  (1975). 
For  a  Markov  chain  {Xf }  on  a  normed  space,  he  considers  the  quantity 

yx  s£[IX,+1 1  -  IX,  I  |  X,=x]  =  E[IX,+1l  |Xf=x]-lxl.  (2.7) 

Intuitively,  for  "stable"  processes,  yx  would  be  negative  for  x  far  from  the  center  of  the  space, 
since  then  the  future  observation  would  be  expected  to  be  closer  to  the  center  than  the  present 
observation.  In  fact,  he  shows  that,  if  {P(x,  •)}  is  strongly  continuous,  the  process  {X, }  is 


(2.8) 
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Recurrent  if  and  only  if  there  is  some  a  >  0 
with  yx  <0  for  all  x  such  that  lxl>a. 

Ergodic  if  and  only  if  there  is  some  a  >  0  such  that  yx  <  -c 

for  all  x  satisfying  I  x  I  >  a  for  some  positive  constant  c , 

and  yx  is  bounded  above  for  ail  x  satisfying  I  x  I  <  a .  (2.9) 

In  particular,  when  Ixl  =  Ijtj  I  +  •  •  •  +  \xd  I  is  the  L1  norm  and  when  the  components  of  X, 
are  all  either  always  positive  or  always  negative,  then  yx  is  easily  computed  and  direct  evalua¬ 
tion  of  the  long  term  properties  of  the  model  is  possible.  These  ideas  are  the  topic  for  the 
next  section. 


3.  THE  FORECAST  MEAN,  RECURRENCE  AND  ERGODICITY 

In  this  section,  we  state  and  prove  the  two  results  of  this  paper.  The  first  result  gives  a 
simple  and  general  expression  for  the  forecast  mean.  The  second  one  uses  this  to  compute  the 
quantity  yx  of  (2.7)  in  some  cases,  and  thus  allows  a  direct  check  of  recurrence  and  ergodi- 
city. 

Theorem  1:  Let  @^Rd  be  open  and  suppose  that  (2.2),  (2.3)  and  (2.4)  hold  with  o,+lu  >0 
Md  k,+1|,  e  'F.  Then 


E  [y,+i|y*l  =  Kr+iu,  (3.1) 

where  y‘  =  (yh  .  .  .  ,yt). 

Proof:  We  compute  the  forecast  mean  directly  and  then  apply  Theorem  2  of  Diaconis  and 
Ylvisaker  (1979). 


£[yj+i|y*3  =  |  yt+i  piy,+\\yt)dyt+1 


p(y/+il0r+iy)p(®»+ily/MO/+i 


K*y»+i- 


(3.2) 


j 
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By  Fubini's  theorem,  whose  application  here  is  justified  by  the  regularity  of  the  exponential 
families  and  the  finiteness  of  the  integral  in  (3.2)  as  shown  below,  this  is  equal  to 


y»+iP(y»+ilOf+iy)dyt+i 


mfy+ilyO^Gt+i 


| E  fyf+i|6t+iy]p(0r+i!yi)^e/+i 


=  |  VM(0,+1)p(ef+1|y,)d0f+1 


=  £[VM(ef+1)ly*], 

9M(0) 


where  VM  (0)  = 


3M(0) 


00!  00d 
the  right-hand  side  of  (3.3)  is  equal  to  K/+1)/.  The  equality 

£[yt+il0r+i.y/]  =  VM(0f+1) 


(3.3) 


.  By  Diaconis  and  Ylvisaker  (1979,  Theorem  2), 


is  standard  exponential  family  theory  (e.g.  Bickel  and  Doksum,  1977,  p.71)  and  is  also  given 
in  equation  (2.2i)  of  Diaconis  and  Ylvisaker  (1979).  □ 

Remark:  One  way  to  use  the  result  (3.1)  is  to  consider  the  difference  R,  sy(+1-K :,+1„  as  a 
residual  at  time  t .  This  provides  a  starting  point  for  model  checking. 

The  second  result  of  our  paper  concerns  the  long-term  properties  of  some  exponential 
family  state  space  models.  This  result  is  a  direct  application  of  Theorem  1  and  the  results  of 
Tweedie  (1975),  as  reported  in  (2.7),  (2.8)  and  (2.9)  above,  to  the  process  {Xf }  h  { Kf  |f  }  Of 
the  parameters  of  the  state  density. 

Theorem  2:  Suppose  that  ©cRrf  is  open  and  that  or  'F^Rf  where 

R±  =  (*e  Rd:±Xj£0  for  t=l,...,d).  Also,  assume  that  (2.2),  (2.3)  and  (2.4)  hold  where 
o,+i  i,  >0  and  k,+1|,  e  *¥  is  a  deterministic,  time-invariant  function  of  k,  |,.  Then 

\  =  suT  (Kf+1|,(k)-k),  (3.4) 

and  the  long-term  model  properties  are  given  by  (2.8)  and  (2.9).  In  (3.4),  s  =  1  if  *¥  c  R+  and 
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s  =-l  if  ^F^R^,  and  uT  is  a  ti -vector  of  ones. 

Proof:  Using  the  L1  norm,  direct  calculation  of  (2.7)  for  the  Markov  process  {k,  u  }  gives 
Yk  =  E{  lKf+1,,+1l-  Ik,„  I  |  K,„=k] 

=  5  ur  E  [{(l-^+1)K/+1)l+gf+1Y/+1}  -  K,„  I  K,,;=k] 

=  5  uT  {(l-^+i)K,+1„(k)  +gt+lE[Yt+i  I  K,„=k]  -k}.  (3.5) 

By  the  definition  of  the  state-space  model,  p (G/+i I yf )  =  P(9r+i|ot  u»  u )*  Thus  (2.6)  yields 
p(y(+i|yf)  =  p(y,+i|oMf,Kf |().  It  follows  by  Theorem  1  that  the  conditional  expectation  in 
the  second  term  on  the  right-hand  side  of  (3.5)  is  equal  to  K,+1  u(k).  We  therefore  obtain 

yk  =  suT  (K,+ll/(k)-k}  .  □ 

Theorem  2  holds  when  the  components  of  y,  are  all  positive  or  all  negative  and  when 
the  state  prediction  rule  is  deterministic  and  time-invariant.  While  these  conditions  may  seem 
somewhat  restrictive,  there  are  many  models  whose  long-term  properties  can  be  studied  by 
these  methods,  as  we  illustrate  with  the  examples  below. 

4.  EXAMPLES 

4,1  The  steady  model 

The  most  common  state  prediction  rule  is  Kt+lif  =Kf  and  af+1|,  =  8a(|,  with  0<5<1, 
which  Smith  (1979)  uses  to  define  a  "steady"  model  for  non-Gaussian  state  space  models. 
For  this  steady  model,  when  the  conditions  of  Theorem  2  apply,  yk=0  so  that  any  steady 
exponential  family  model  of  this  type  is  recurrent,  by  (2.8).  Such  a  model  is  ergodic,  how¬ 
ever,  if  and  only  if  the  space  'F  is  bounded  above  in  all  dimensions,  by  (2.9).  This  last  can 
be  seen  by  taking,  when  'F  is  bounded,  a=max{  I  x  1  :x  e  so  that  (2.9)  is  satisfied  trivially. 
In  fact,  these  results  show  that  any  exponential  family  state  space  model  satisfying  the  condi¬ 
tions  of  the  theorems  and  having  'F  bounded  in  all  dimensions  is  both  ergodic  and  recurrent. 
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These  results  for  exponential  family  state  space  models  satisfying  Theorem  2  are  in  con¬ 
trast  to  the  normal  steady  model  defined  when  (2.1)  and  (2.2)  are  Gaussian.  This  normal 
steady  model  is  neither  ergodic  nor  recurrent,  as  can  be  seen  either  by  noting  that  it  is 
equivalent  to  a  normal  random  walk  (which  has  neither  stability  property)  observed  with 
Gaussian  error,  or  by  noting  that  there  is  an  equivalent  ARIMA(0,1,1)  model,  which  is  neither 
ergodic  nor  recurrent 

4.2  A  Poisson  time  series  model 

The  conjugate  state  density  for  the  Poisson  with  mean  A,  specifies  that  A,  has  a  gamma 
distribution  with  scale  parameter  c,  u  and  shape  parameter  af  UK,  )f .  The  forecast  distribution 
can  be  computed  directly  from  (2.6)  to  be  negative  binomial  with  mean  ki+1  u  (as  in  Theorem 

ttj |  f  1 

1)  and  variance  - Kt+in-  There  does  not  appear  to  be  a  general  result  such  as 

°t+iu 

Theorem  1  for  forecast  variances. 

The  conditions  of  Theorem  2  apply  (with  s  =  1)  for  appropriate  state  prediction  rules. 
The  steady  model  is  recurrent  but  not  ergodic  by  Section  4.1,  since  ¥  is  unbounded.  The 
theorem  would  also  apply  to  other  models  specifying  particular  types  of  growth  or  decay.  For 
example,  for  Poisson  observations,  ki+1u  =  AKMr,  along  with  the  discounting  af+n,  =  So,  lt 
as  before,  specifies  exponential  decay  (0<  A<  1),  exponential  growth  (1  <  A)  or  the  steady 
model  (A=  1).  Then  y*  =  (X-l)k,  so  that  the  process  is  not  recurrent  and  hence  not  ergodic 
if  A>  1.  Also,  the  process  is  ergodic  and  hence  recurrent  if  A  <  1 .  For  A  *  1,  these  models  do 
not  appear  to  have  been  studied  in  the  literature,  and  Theorem  1  motivates  them  by  giving  an 
interpretation  of  the  state  prediction  rules  that  define  them.  They  may  be  useful  for  modeling 
the  evolution  of  population  size  over  time.  They  have  the  advantage  of  handling  zero  obser¬ 
vations  in  a  natural  way,  and  they  are  flexible  enough  to  be  used  together  with  other 
approaches  such  as  threshold  modeling.  Care  is  needed  in  defining  such  models,  since  k/+,  (, 
must  remain  in  VF. 
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4.3  Models  for  binomial  and  multinomial  time  series 

We  now  consider  the  case  where  the  observation  distribution  is  multinomial,  or  binomial 

as  a  special  case,  with  N  independent  of  time.  Then  all  of  the  conditions  of  the  Theorems  are 

satisfied  (with  5  =  1)  provided  that  the  multinomial  is  expressed  in  a  non- degenerate  form  in 

terms  of  the  first  d  components  for  a  (d+l)-component  multinomial.  The  natural  conjugate  to 

the  multinomial  distribution  is  the  Dirichlet  distribution  with  parameters 

d 

o,  i,k/|,  ,  •  ■  •  ,  af  ,a,  u(n  -  ,f)  and  again  the  forecast  distribution  is  a  familiar  one, 

»=i 

the  Dirichlet-multinomial  (Mosimann,  1962).  The  forecast  mean  is  given  by  Theorem  1, 
while  the  forecast  variance  of  the  y'th  component  of  the  multinomial  observation  is 
0^ ||*^1 

- tc/+1 1  ,(«-</+!  |  {).  As  mentioned  in  Section  4.1,  any  multinomial  state  space  model 

°r  + 1  if 

is  both  ergodic  and  recurrent. 
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